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Description 

BACKGROUND OF THE INVENTION 

5 [0001] The relationship between structure and function of macromolecules is of fundamental Importance in the un- 
derstanding of biological systems. These relationships are important to understanding, for example, the functions of 
enzymes, structural proteins and signalling proteins, ways in which cells communicate with each other, as well as 
mechanisms of cellular control and metatx)lic feedback. 

[0002] Genetic Information is critical in continuation of life processes. Life is substantially informationally based and 
10 its genetic content controls the growth and reproduction of the organism and its complements. The amino acid se- 
quences of polypeptides, which are critical features of all living systems, are encoded by the genetic material of the 
cell. Further, the properties of these polypeptides, e.g., as enzymes, functional proteins, and structural proteins, are 
determined by the sequence of amino acids which make them up. As structure and function are integrally related, many 
biological functions may be explained by elucidating the underlying structural features which provide those functions, 
15 and these structures are determined by the underlying genetic information in the form of polynucleotide sequences. 
Further, in addition to encoding polypeptides, polynucleotide sequences also can be involved in control and regulation 
of gene expression. It therefore follows that the determination of the make-up of this genetic information has achieved 
significant scientific importance. 

[0003] As a specific example, diagnosis and treatment of a variety of disorders may often be accomplished through 
20 identification and/or manipulation of the genetic material which encodes for specific disease associated traits. In order 
to accomplish this, however, one must first identify a correlation between a particular gene and a particular trait. This 
is generally accomplished by providing a genetic linkage map through which one identifies a set of genetic markers 
that follow a particular trait. These markers can identify the location of the gene encoding for that trait within the genome, 
eventually leading to the identification of the gene. Once the gene is identified, methods of treating the disorder that 
25 result from that gene, i.e., as a result of overexpression, constitutive expression, mutation, underexpression, etc., can 
be more easily developed. 

[0004] One class of genetic markers Includes variants In the genetic code termed "polymorphisms." In the course of 
evolution, the genome of a species can collect a number of variations in Individual bases. These single base changes 
are termed single-base polymorphisms. Polymorphisms may also exist as stretches of repeating sequences that vary 

30 as to the length of the repeat from individual to individual. Where these variations are recurring, e.g., exist in a significant 
percentage of a population, they can be readily used as markers linked to genes involved in mono- and polygenic traits, 
in the human genome, single-base polymorphisms occur roughly once per 300 bp. Though many of these variant 
bases appear too infrequently among the allele population for use as genetic markers (i.e., <1 %), useful polymorphisms 
(e.g., those occurring in 20 to 50 % of the allele population) can be found approximately once per kilobase. Accordingly, 

35 In a human genome of approximately 3 Gb, one would expect to find approximately 3,000,000 of these "useful" poly- 
morphisms. 

[0005] The use of polymorphisms as genetic linkage markers is thus of critical importance in locating, identifying and 
characterizing the genes which are responsible for specific traits. In particular, such mapping techniques allow for the 
identification of genes responsible for a variety of disease or disorder-related traits which may be used in the diagnosis 
40 and or eventual treatment of those disorders. Given the size of the human genome, as well as those of other mammals, 
it would generally be desirable to provide methods of rapidly identifying and screening for polymorphic genetic markers. 
The present invention meets these and other needs. 

SUMMARY OF THE INVENTION 

45 

[0006] The present Invention generally provides methods useful in screening large numbers of polymorphic markers 
In a genome. In particular, the present invention provides a method of identifying whether a target nucleic acid sequence 
includes a polymorphic variant comprising: 

so hybridising said target nucleic acid sequence to an array of oligonucleotide probes, said array comprising at least 
one detection block of probes said detection block including first and second groups of probes that are comple- 
mentary to said target nucleic acid sequence having first and second variants of said polymorphism, respectively, 
and further comprising third and fourth groups of probes, said third and fourth groups of probes having sequences 
identical to said first and second groups of probes, respectively, except that said third and fourth groups of probes 

55 include all possible monosubstltutions of positions in said sequence that are within n bases of a base in said 

sequence that is complementary to said polymorphism, wherein n is from 1 to 5; 
determining hybridisation intensities of probes in the group; 

calculating a ratio PM(x)-MM(x)/PM(y)-MM(y). wherein PM(x) is the average hybridisation intensity of probes that 
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are perfectly complementary to the first variant of the polymorphism, MM(x) is the average hybridisation intensity 
of probes that are complementary to the first variant except for a single mismatch, PM(y) is the average hybridisation 
intensity of probes that are perfectly complementary to the second variant of the polymorphism, MM(y) is the 
average hybridisation intensity of probes that are complementary to the second variant except for a single mis- 
5 match; and 

characterising the polymorphism as homozygous for the first variant, homozygous for the second variant or het- 
erozygous for the first and second variants from the ratio. 

10 BRIEF DESCRIPTION OF THE DRAWINGS 

[0007] Figure 1 shows a schematic illustration of light-directed synthesis of oligonucleotide arrays. 
[0008] Figure 2A shows a schematic representation of a single oligonucleotide array containing 78 separate detection 
blocks. Figure 2B shows a schematic illustration of a detection block for a specific polymorphism denoted WI-567. 
15 Figure 28 also shows the triplet layout of detection blocks for the polymorphism employing 20-mer oligonucleotide 
probes having substitutions 7, 10 and 13 bp from the 3" end of the probe. The probes present In the shaded portions 
of each detection block are shown adjacent to each detection block. 

[0009] Figure 3 illustrates a tiling strategy for a polymorphism denoted WI-567, and having the sequence 5'-TGCT- 
GCCTTGGTTC[A/G] AGCCCTCATCTCTTT-3' (SEQ ID N0:1 ). A detection block specific for the WI-567 polymorphism 
20 is shown with the probe sequences tiled therein listed above. Predicted patterns for both homozygous forms and the 

heterozygous form are shown at the bottom. 

[0010] Figure 4 shows a schematic representation of a detection block specific for the polymorphism denoted Wl- 
1959 having the sequence 5'-ACCAAAAATCAGTC[T/C]GGGTAACTGAGAGTG-3* (SEQ ID NO: 2) with the polymor- 
phism indicated by the brackets. A fluorescent scan of hybridization of the heterozygous and both homozygous forms 
25 are shown In the center, with the predicted hybridization pattern for each being indicated below. 

[0011] Figure 5 illustrates an example of a computer system used to execute the software of the present invention 
which determines whether polymorphic markers in DMA are heterozygote, homozygote with a first polymorphic marker 
or homozygote with a second polymorphic marker. 

[0012] Figure 6 shows a system block diagram of computer system 1 used to execute the software of the present 
30 invention. 

[0013] Figure 7 shows a probe array including probes with base substitutions at base positions within two base 
positions of the polymorphic marker. The position of the polymorphic marker Is denoted Pq and which may have one 
of two polymorphic markers x and y (where x and y are one of A, C, G, or T). 

[0014] Figure 8 shows a probe array Including probes with base substitutions at base positions within two base 

35 positions of the polymorphic marker. 

[0015] Figure 9 shows a high level flowchart of analyzing intensities to determine whether polymorphic markers in 
DNA are heterozygote, homozygote with a first polymorphic marker or homozygote with a second polymorphic marker. 
[001 6] Figure 1 0A shows a tiling arrangement of an array tiled for detecting 246 different polymorphic markers, both 
sense and antlsense strands. Each different polymorphism detection block is indicated by a number representing a 

40 specific, preidentified polymorphism. Figure 108 shows a fluorescent scan of the array following exposure to fluores- 
cently labelled target sequence. 

DETAILED DESCRIPTION OF THE INVENTION 

45 I. General 

[0017] The present invention generally provides rapid and efficient methods for screening samples of genomic ma- 
terial for polymorphisms, and arrays specifically designed for carrying out these analyses. In particular, the present 
invention relates to the identification and screening of single base polymorphisms in a sample. In general, the methods 

50 of the present Invention employ arrays of oligonucleotide probes that are complementary to target nucleic acid se- 
quence segments from an individual (e.g., a human or other mammal) which target sequences include specific Identified 
polymorphisms, or "polymorphic markers." The probes are typically arranged in detection blocks, each block being 
capable of discriminating the three genotypes for a given marker, e.g., the heterozygote or either of the two homozy- 
gotes. The method allows for rapid, automatable .analysis of genetic linkage to even complex polygenic traits. 

55 [0018] Oligonucleotide arrays typically comprise a plurality of different oligonucleotide probes that are coupled to a 
surface of a substrate in different known locations. These oligonucleotide arrays, also described as "Genechips™," 
have been generally described in the art, for example, U.S. Patent No. 5.143,854 and PCT patent publication Nos. 
WO 90/15070 and 92/10092. These arrays may generally be produced using mechanical synthesis methods or light 
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directed synthesis methods which incorporate a combination of photolithographic methods and solid phase oligonu- 
cleotide synthesis methods. See Fodor et al., Science, 251:767-777 (1991), Pirrung et al., U.S. Patent No. 5,143,854 
(see also PCT Application No. WO 90/15070) and Fodor et al.. PCT Publication No. WO 92/10092 and U.S. Patent 
No. 5,424,186. Techniques for the synthesis of these arrays using mechanical synthesis methods are described in, e. 
5 g., U.S. Patent No. 5,384,261 . 

[0019] The basic strategy for light directed synthesis of oligonucleotides on a VLSIPS^"^ Array is outlined in Figure 

I . The surface of a substrate or solid support, modified with photosensitive protecting groups (X) is illuminated through 
a photolithographic mask, yielding reactive hydroxyl groups in the illuminated regions. A selected nucleotide, typically 
in the form of a 3'-0-phosphoramidite-activated deoxynucleoside (protected at the 5' hydroxyl with a photosensitive 

10 protecting group), is then presented to the surface and coupling occurs at the sites that were exposed to light. Following 
capping and oxidation, the substrate is rinsed and the surface is illuminated through a second mask, to expose addi- 
tional hydroxyl groups for coupling. A second selected nucleotide (e.g., 5'-protected, 3'-0-phosphoramidite-activated 
deoxynucleoside) is presented to the surface. The selective deprotection and coupling cycles are repeated until the 
desired set of products Is obtained. Pease et al., Proc. Natl. Acad. ScL (1994) 91:5022-5026. Since photolithography 

15 is used, the process can be readily miniaturized to generate high density arrays of oligonucleotide probes. Furthermore, 
the sequence of the oligonucleotides at each site is known. 

II. Identification of Polymorphisms 

20 [0020] The methods and arrays of the present Invention primarily find use in the Identification of so-called "useful" 

polymorphisms (i.e., those that are present in approximately 20% or more of the allele population). The present inven- 
tion also relates to the detection or screening of specific variants of previously identified polymorphisms. 
[0021] A wide variety of methods can be used to identify specific polymorphisms. For example, repeated sequencing 
of genomic material from large numbers of individuals, although extremely time consuming, can be used to identify 

25 such polymorphisms. Alternatively, ligation methods may be used, where a probe having an overhang of defined se- 
quence is ligated to a target nucleotide sequence derived from a number of individuals. Differences in the ability of the 
probe to ligate to the target can reflect polymorphisms within the sequence. Similarly, restriction patterns generated 
from treating a target nucleic acid with a prescribed restriction enzyme or set of restriction enzymes can be used to 
identify polymorphisms. Specifically, a polymorphism may result in the presence of a restriction site in one variant but 

30 not In another. This yields a difference in restriction patterns for the two variants, and thereby Identifies a polymorphism. 
Oligonucleotide arrays may also be used to identify polymorphisms. For example, as described in U.S. Patent Appli- 
cation Serial No. 08/485,606, filed June 7, 1995 polymorphisms may be identified using type-Ms endonucleases to 
capture and amplify ambiguous base sequences adjacent the restriction sites. The captured sequences are then char- 
acterized on oligonucleotide arrays. The patterns of these captured sequences are compared from various individuals, 

35 the differences being indicative of potential polymorphisms. Alternative array-based methods may also be used to 
identify polymorphisms, including the methods described in U.S. Patent Application No. 08/629.031 , filed April 8, 1 996. 
Briefly, these methods hybridize a target nucleic acid against an appropriately tiled array, e.g., having probes comple- 
mentary to step-wise segments of the target sequence. The ratio of hybridization intensity of perfectly matched probes 
to mismatched probes is plotted as a function of the position that is being interogated in the sequence, for each individual 

40 screened. Where a polymorphism Is present, it yields a discrepency between the data plotted for the individuals, e.g., 
a point of separation of the two or more individual's plots. 

[0022] In a preferred aspect, the identification of polymorphisms takes into account the assumption that a useful 
polymorphism (i.e., one that occurs in 20 to 50% of the allele population) occurs approximately once per Ikb in a given 
genome. In particular, random sequences of a genome, e.g., random Ikb sequences of the human genome such as 

45 expressed sequence tags or "ESTs", can be sequenced from a limited number of individuals. When a variant base Is 
detected with sufficient frequency, it is designated a "useful" polymorphism. In practice, the method generally analyzes 
the same 1 kb sequence from a small number of unrelated individuals, i.e., from 3 to 5 individuals (6 to 10 alleles). 
Where a variant sequence is identified, it is then compared to a separate pool of material from unrelated individuals. 
Where the variant sequence identified from the first set of individuals is detectable in the pool of the second set, it is 

50 assumed to exist at a sufficiently high frequency, e.g., at least about 20% of the allele population, thereby qualifying 
as a useful marker for genetic linkage analysis. 

III. Screening Polymorphisms 

55 [0023] screening polymorphisms In samples of genomic matenal according to the methods of the present invention, 
is generally carried out using arrays of oligonucleotide probes. These arrays may generally be "tiled" for a large number 
of specific polymorphisms. By "tiling" is generally meant the synthesis of a defined set of oligonucleotide probes which 
is made up of a sequence complementary to the target sequence of Interest, as well as preselected variations of that 
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sequence, e.g., substitution of one or more given positions with one or more members of the basis set of monomers, 
i.e. nucleotides. Tiling strategies are discussed in detail in Published PCT Application No. WO 95/11995. By "target 
sequence" is meant a sequence which has been identified as containing a polymorphism, and more particularly, a 
single-base polymorphism, also referred to as a "biallellc base." It will be understood that the tern "target sequence" 
5 is intended to encompass the various forms present in a particular sample of genomic material, i.e., both alleles in a 
diploid genome. 

[0024] In a particular aspect, arrays are tiled for a number of specific, identified polymorphic marker sequences. In 
particular, the array is tiled to include a number of detection blocks, each detection block being specific for a specific 
polymorphic marker or set of polymorphic markers. For example, a detection block may be tiled to include a number 

10 of probes which span the sequence segment that includes a specific polymorphism. To ensure probes that are com- 
plementary to each variant, the probes are synthesized in pairs differing at the biallelic base. 
[0025] In addition to the probes differing at the biallelic bases, monosubstituted probes are also generally tiled within 
the detection block. These monosubstituted probes have bases at and up to a certain num ber of bases in either direction 
from the polymorphism, substituted with the remaining nucleotides (selected from A, T, G. C or U). Typically, the probes 

15 in a tiled detection block will include substitutions of the sequence positions up to and Including those that are 5 bases 
away from the base that corresponds to the polymorphism. Preferably bases up to and including those in positions 2 
bases from the polymorphism will be substituted. The monosubstituted probes provide internal controls for the tiled 
array, to distinguish actual hybridization from artifactuat cross-hybridization. An example of this preferred substitution 
pattern is shown in Figure 3. 

20 [0026] A variety of tiling configurations may also be employed to ensure optimal discrimination of perfectly hybridizing 
probes. For example, a detection block may be tiled to provide probes having optimal hybridization intensities with 
minimal cross-hybridization. For example, where a sequence downstream from a polymorphic base is G-C rich, it could 
potentially give rise to a higher level of cross-hybridization or "noise," when analyzed. Accordingly, one can tile the 
detection block to take advantage of more of the upstream sequence. Such alternate tiling configurations are sobe- 
rs matically illustrated in Figure 2B, bottom, where the base in the probe that is complementary to the polymorphism is 
placed at different positions in the sequence of the probe relative to the 3' end of the probe. For ease of discussion, 
both the base which represents the polymorphism and the complementary base in the probe are referred to herein as 
the "polymorphic base" or "polymorphic marker." 

[0027] Optimal tiling configurations may be determined for any particular polymorphism by comparative analysis. 
30 For example, triplet or larger detection blocks like those illustrated in Figure 28 may be readily employed to select such 

optimal tiling strategies. 

[0028] Arrays may be tiled for one or both strands of the target sequence, i.e., the sequence including the polymor- 
phism. The inclusion of probes that hybridize to both the sense and antisense strands, either on a single array or 
separate arrays, provides an additional level of verification for a given interogation. Thus, in addition to probes that are 

35 complementary to one sdtrand of a target sequence, a detection block will also include probes that are complementary 
to the antisense strand of the target sequence, and which are therefor complementary to the first group of probes. 
[0029] Additionally, arrays will generally be tiled to provide for ease of reading and analysis. For example, the probes 
tiled within a detection block will generally be arranged so that reading across a detection block, the probes are tiled 
in succession, i.e., progressing along the target sequence one or more bases at a time (See, e.g., Figure 3, middle). 

40 [0030] Once an array is appropriately tiled for a given polymorphism or set of polymorphisms, the target nucleic acid 
is hybridized with the array and scanned. Hybridization and scanning are generally carried out by methods described 
in, e.g., Published PCT Application Nos. WO 92/10092 and WO 95/11995, and U.S. Patent No. 5,424,186. In brief, a 
target nucleic acid sequence which includes one or more previously identified polymorphic markers is amplified by well 
known amplification techniques, e.g., PGR. Typically, this involves the use of primer sequences that are complementary 

45 to the two strands of the target sequence both upstream and downstream from the polymorphism. Asymmetric PGR 
techniques may also be used, i.e., where an array is tiled for only a sense or antisense strand. Amplified target, generally 
incorporating a label, is then hybridized with the array under appropriate conditions. Incorporation of a label generally 
involves incorporating a labeled nucleotide into the amplification reaction, whereby the label is incorporated into the 
target nucleic acid. Typically useful labels include fluorescent labels coupled to nucleotides, as well as well known 

50 binding groups, e.g.. biotin, streptavidin and the like, to which a labelled complement may be later bound. 

[0031] Upon completion of hybridization and washing of the array, the array is scanned to determine the position on 
the array to which the target sequence hybridizes. The hyridization data obtained from the scan, i.e., in the form of 
fluorescence intensities or some other detectable label or dye, is then plotted as a function of location on the array 
[0032] Although primarily described in terms of a single detection block, e.g., for detection of a single polymorphism, 

55 in preferred aspects, the arrays of the Invention will include multiple detection blocks, and thus be capable of analyzing 
multiple, specific polymorphisms. For example, preferred arrays wilt generally include from about 50 to about 4000 
different detection blocks with particularly preferred arrays including from 100 to 3000 different detection blocks. 
[0033] In alternate arrangements, it will generally be understood that detection blocks may be grouped within a single 
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array or in multiple, separate arrays so that varying, optimal conditions may be used during the hybridization of the 
target to the array. For example, It may often be desirable to provide for the detection of those polymorphisms that fall 
within G-C rich stretches of a genomic sequence, separately from those falling in A-T rich segments. This allows for 
the separate optimization of hybridization conditions for each situation. 

5 

IV. Calling 

[0034] After hybridization and scanning, the hybridization data from the scanned array Is then analyzed to identify 
which variant or variants of the polymorphic marker are present in the sample, or target sequence, as determined from 

10 the probes to which the target hybridized, e.g., one of the two homozygote forms or the heterozygote form. This de- 
termination is termed "calling" the genotype. Calling the genotype Is typically a matter of comparing the hybridization 
data for each potential variant, and based upon that comparison, identifying the actual variant (for homozygotes) or 
variants (for heterozygotes) that are present. In one aspect, this comparison involves taking the ratio of hybridization 
intensities (corrected for average background levels) for the expected perfectly hybridizing probes for a first variant 

15 versus that of the second variant. Where the marker Is homozygous for the first variant, this ratio will be a large number, 
theoretically approaching an infinite value. Where homozygous for the second variant, the ratio will be a very tow 
number, I.e., theoretically approaching zero. Where the marker Is heterozygous, the ratio will be approximately 1 . These 
numbers are, as described, theoretical. Typically, the first ratio will be well in excess of 1 , i.e., 2, 4, 5 or greater, similarly, 
the second ratio will typically be substantially less than 1, i.e., 0.5, 0.2, 0.1 or less. The ratio for heterozygotes will 

20 typically be approximately equal to 1, i.e. from 0.7 to 1.5. These ratios can vary based upon the specific sequence 
surrounding the polymorphism, and can also be adjusted based upon a standard hybridization with a control sample 
containing the variants of the polymorphism. The ratio may be put on a linear scale bytaking the log^o ^^^^^ 
multiplying the result by 10. This makes It easier to interpret the results of the comparison of the intensities observed. 
[0035] The quality of a given call for a particular genotype may also be checked. For example, the maximum perfect 

25 match intensity can be divided by a measure or the background noise (which may be represented by the standard 
deviation of the mismatched intensities). Where the ratio exceeds some preselected cut-off point, the call Is determined 
to be good. For example, where the maximum intensity of the expected perfect matches exceeds twice the noise level, 
it might be termed a good call. In an additional aspect, the present invention provides software for performing the above 
described comparisons. 

30 [0036] Fig. 5 Illustrates an example of a computer system used to execute the software of the present invention 
which determines whether polymorphic markers in DNA are heterozygote, homozygote with a first variant of a poly- 
morphism or homozygote with a second variant of a polymorphism. Fig. 5 shows a computer system 1 which Includes 
a monitor 3, screen 5, cabinet 7, keyboard 9, and mouse 11. Mouse 11 may have one or more buttons such as mouse 
buttons 13. Cabinet 7 houses a CD-ROM drive 15 or a hard drive (not shown) which may be utilized to store and 

35 retrieve software programs incorporating the present invention, digital images for use with the present invention, and 
the like. Although a CD-ROM 17 is shown as the removable media, other removable tangible media Including floppy 
disks, tape, and flash memory may be utilized. Cabinet 7 also houses familiar computer components (not shown) such 
as a processor, memory, and the like. 

[0037] Fig. 6 shows a system block diagram of computer system 1 used to execute the software of the present 

40 Invention. As in Fig. 5, computer system 1 includes monitor 3 and keyboard 9. Computer system 1 further includes 
subsystems such as a central processor 102, system memory 104, I/O controller 106, display adapter 108, removable 
disk 112, fixed disk 116, network Interface 118, and speaker 120. Other computer systems suitable for use with the 
present Invention may include additional or fewer subsystems. For example, another computer system could Include 
more than one processor 102 (i.e., a multi-processor system) or a cache memory. 

45 [0038] Arrows such as 1 22 represent the system bus architecture of computer system 1 . However, these arrows are 
illustrative of any interconnection scheme serving to link the subsystems. For example, a local bus could be utilized to 
connect the central processor to the system memory and display adapter. Computer system 1 shown in Fig. 6 is but 
an example of a computer system suitable for use with the present invention. Other configurations of subsystems 
suitable for use with the present invention wilt be readily apparent to one of ordinary skill in the art. 

50 [0039] Fig. 7 shows a probe array including probes with base substitutions at base positions within two base positions 
of the polymorphic marker. The position of the polymorphic marker is denoted Pq and which may have one of two 
variants of the polymorphic markers x and y (where x and y are one of A, C, G, or T). As indicated, at P.2 there are 
two columns of four cells which contain a base substitution two base positions to the left, or 3', from the polymorphic 
marker. The column denoted by an "x" contains polymorphic marker x and the column denoted by a "y" contains 

55 polymorphic marker y. 

[0040] Similarly, P.^ contains probes with base substitutions one base position to the left, or 3', of the polymorphic 
marker. Pq contains probes with base substitutions at the polymorphic marker position. Accordingly, the two columns 
in Pq are identical. P^ and P2 contain base substitutions one and two base positions to the right, or 5'. of the polymorphic 
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marker, respectively. 

[0041] As a hypothetical example, assume a single base polymorphism exists where one allele contains the subse- 
quence TCAAG whereas another allele contains the subsequence TCGAG, where the underlined base indicates the 
polymorphism in each allele. Fig. 8 shows a probe array Including probes with base substitutions at base positions 

5 within two base positions of the polymorphic marker. In the first two columns, the cells which contain probes with base 
A (complementary to T in the alleles) two positions from the left of the polymorphic marker are shaded. They are shaded 
to indicate that it is expected that these cells would exhibit the highest hybridization to the labeled sample nucleic acid. 
Similarly, the second two columns have cells shaded which have probes with base G (complementary to G in the 
alleles) one position to the left of the polymorphic marker. 

10 [0042] At the polymorphic marker position (corresponding to Pq in Fig. 7), there are two columns: one denoted by 
an "A" and one denoted by a "G". Although, as indicated earlier, the probes in these two columns are identical, the 
probes contain base substitutions for the polymorphic marker position. An "N" indicates the cells that have probes 
which are expected to exhibit a strong hybridization If the allele contains a polymorphic marker A. As will become 
apparent in the following paragraphs. "N" stands for numerator because the intensity of these cells will be utilized in 

15 the numerator of an equation. Thus, the labels were chosen to aid the reader's understanding of the present invention. 
[0043] A "D" Indicates the cells that have probes which are expected to exhibit a strong hybridization if the allele 
contains a polymorphic marker G. "D" stands for denominator because the intensity of these cells will be utilized in the 
denominator of an equation. The "n" and "d" labeled cells indicate these cells contain probes with a single base mis- 
match near the polymorphic marker. As before, the labels Indicate where the intensity of these cells will be utilized in 

20 a following equation. 

[0044] Fig. 9 shows a high level flowchart of analyzing intensities to determine whether polymorphic markers in DNA 
are heterozygote, homozygote with a first polymorphic marker or homozygote with a second polymorphic marker. At 
step 202, the system receives the fluorescent intensities of the cells on the chip. Although in a preferred embodiment, 
the hybridization of the probes to the sample are determined from fluorescent intensities, other methods and labels 
25 including radioactive labels may be utilized with the present invention. An example of one embodiment of a software 
program for carrying out this analysis is reprinted in Software Appendix A. 

[0045] A perfect match (PM) average for a polymorphic marker x is determined by averaging the intensity of the cells 
at Pq that have the base substitution equal to x in Fig. 7. Thus, for the example in Fig. 8, the perfect match average 
for A would add the intensities of the cells denoted by "N" and divide the sum by 2. 

30 [0046] A mismatch (MM) average for a polymorphic marker x is determined by averaging the intensity of the cells 
that contain the polymorphic marker x and a single base mismatch in Fig. 7. Thus, for the example in Fig. 8, the 
mismatch average for A would be the sum of cells denoted by "n" and dividing the sum by 14. 
[0047] A perfect match average and mismatch average for polymorphic marker y is determined in a similar manner 
utilizing the cells denoted by "D" and "d", respectively. Therefore, the perfect match averages are an average intensity 

35 of cells containing probes that are perfectly complementary to an allele. The mismatch averages are an average of 
Intensity of cells containing probes that have a single base mismatch near the polymorphic marker in an allele. 
[0048] At step 204, the system calculates a Ratio of the perfect match and mismatch averages for x to the perfect 
match and mismatch averages for y. The numerator of the Ratio includes the mismatch average for x subtracted from 
the perfect mismatch for x. In a preferred embodiment, if the resulting numerator is less than 0, the numerator is set 

40 equal to 0. 

[0049] The denominator of the Ratio includes the mismatch average for y subtracted from the perfect mismatch for 
y. In a preferred embodiment, if the resulting denominator is less than or equal to 0, the denominator is set equal to a 
minimum value, i.e., 0.00001. 

[0050] Once the system has calculated the Ratio, the system calculates DB at step 206. DB is calculated by the 
45 equation DB = 10*logioRatio. The logarithmic function puts the ratio on a linear scale and makes it easier to interpret 

the results of the comparison of intensities. 

[0051] At step 208. the system performs a statistical check on the data or hybridization intensities. The statistical 
check is performed to determine if the data will likely produce good results. In a preferred embodiment, the statistical 
check involves testing whether the maximum of the perfect match averages for x or y is at least twice as great as the 
50 standard deviation of the intensities of all the cells containing a single base mismatch (i.e., denoted by a "n" or "d" in 
Fig. 8). If the perfect match average is at least two times greater than this standard deviation, the data is likely to 
produce good results and this is communicated to the user. 

[0052] The system analyzes DB at step 210 to determine if DB is approaching -oo, near 0, or approaching +«. In 
practice, the DB will typically ot go beyond 50 or -50. If DB is approaching a negative infinity (e.g.. -50). the system 
55 determines that the sample DNA contains a homozygote with a first polymorphic marker corresponding to x at step 
212. If DB is near 0. the system determines that the sample DNA contains a heterozygote corresponding to both 
polymorphic markers x and y at step 214. Although described as approaching «>, etc., as described previously, these 
numbers will generally vary, but are nonetheless indicative of the calls described. If DB is approaching a positive infinity 
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(e.g., +50), the system determines that the sample DNA contains a homozygote with a second polymorphic marker 
corresponding to y at step 216. 

[0053] A visual inspection of the Ratio equation in step 204 shows that the numerator should be higher than the 
denominator If the DNA sample only has the polymorphic marker corresponding to x. similarly, the denominator should 
5 be higher than the numerator if the DNA sample only has a polymorphic marker corresponding to y. If the DNA sample 
has both polymorphic markers, indicating a heterozygote, the Ratio should be approximately equal to 1 which results 
in a 0 when the logarithm of the Ratio is calculated. 

[0054] The equations discussed above illustrate just one embodiment of the present invention. These equations 
have correctly identified polymorphic markers when a visual inspection would seem to indicate a different result. This 

10 may be the case because the equations take into account the mismatch intensities in order to determine the presence 
or absence of the polymorphic markers. Additional methods may also be employed to properly compare the hybridi- 
zation intensities, including, e.g., principal component analysis and the like. One may select the strand of the target 
sequence to optimize the ability to call a particular genome. Alternatively, one may analyze both strands, in parallel, 
to provide greater amounts of data from which a call can be made. Additionally, the analyses, I.e., amplification and 

15 scanning may be performed using DNA, RNA, mixed polymers, and the like. 

[0055] The present invention is further illustrated by the following examples. These examples are merely to illustrate 
aspects of the present invention and are not intended as limitations of this invention. 

V. Examples 

20 

Example 1- Chip Tiling 

[0056] A DNA chip is prepared which contains three detection blocks for each of 78 identified single base polymor- 
phisms or biallelic markers, in a segment of human DNA (the "target" nucleic acid). Each detection block contains 
25 probes wherein the identified polymorphism occurs at the position in the target nucleic acid complementary to the 7th, 
10th and 13th positions from the 3' end of 20-mer oligonucleotide probes. A schematic representation of a single 
oligonucleotide array containing all 78 detection blocks is shown in Figure 2A. 

[0057] The tiling strategy for each block substitutes bases in the positions at, and up to two bases, in either direction 
from the polymorphism. In addition to the substituted positions, the oligonucleotides are synthesized in pairs differing 
30 at the biallelic base. Thus, the layout of the detection block (containing 40 different oligonucleotide probes) allows for 
controlled comparison of the sequences Involved, as well as simple readout without need for complicated instrumen- 
tation. A schematic illustration of this tiling strategy within a single detection block is shown in Figure 3, for a specific 
polymorphic marker denoted WI-567. 

35 Example 2- Detection of Polymorphisms 

[0058] A target nucleic acid is generated from PGR products amplified by primers flanking the markers. These am- 
plicons can be produced singly or in multiplexed reactions. Target can be produced as ss-DNA by asymmetric PGR 
from one primer flanking the polymorphism, as ds-DNA, or as RNA transcribed in vitro from promoters linked to the 
40 primers. Fluorescent or biotin label Is introduced into target directly as dye or biotin-bearing nucleotides. Biotin labelled 
target is then bound after amplification using dye-streptavldin complexes to incorporated biotin containing nucleotides. 
In DNA produced by symetric or asymetric PGR fluorescent dye is linked directly to the 5' end of the primer. 
[0059] Hybridization of target to the arrays tiled in Example 1 , and subsequent washing are carried out with standard 
solutions of salt (SSPE, TMAC1) and nonionic detergent (Triton-XlOO), with or without added organic solvent (forma- 
ts mide). Targets and markers generating strong signals are washed under stringent hybridization conditions (37-40*'C; 
10% formamide; 0.25xSSPE washes) to give highly discriminating detection of the genotype. Markers giving lower 
hybridization intensity are washed under less stringent conditions (<30°G; 3M TMAC1, or 6xSSPE; 6x and 1x SSPE 
washes) to yield highly discriminating detection of the genotype. 

[0060] Detection of one polymorphic marker is illustrated in Figure 3. Specifically, a typical detection block is shown 
50 for the polymorphism denoted WI-1 959, having the sequence 5'-ACCAAAAATCAGTC[T/C]GGGTAACTGAGAGTG-3' 
(SEQ ID NO: 2) with the polymorphism indicated by the brackets (Figure 3, top), for which all three genotypes are 
available (T/C heterozygote, C/G homozygote and T/T homozygote). The expected hybridization pattern for the ho- 
mozygote and heterozygote targets are shown in Figure 3, bottom. Three chips were tiled with each chip including the 
illustrated detection block. Each block contained probes having the substituted bases at the 7th, 1 0th and 1 3th positions 
55 from the 3' end of 20-mer oligonucleotide probes (20/7, 20/1 0 and 20/1 3, respectively). These alternate detection blocks 
were tiled to provide a variety of sequences flanking the polymorphism itself, to ensure at least one detection block 
hybridizing with a sufficiently low background intensity for adequate detection. 

[0061] Fluorouracil containing RNA was synthesized from a T7 promoter on the upstream primer, hybridized to the 
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detection array in 6xSSPE + Triton-XlOO at SO'^C. and washed in 0.25xSSPE at room temperature. As shown in the 
scan Figure 3, middle, fluorescent scans of the arrays correctly identified the 5 homozygote or 1 0 heterozygote features. 
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APPENDIX A 
SOFTWARE APPENDIX 
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IS 



25 



40 



50 



# fullcel.awlc * 
i eaJces inpuc from a POtYehip CEL file (115 x 130) and # 
10 # extractis ratio infonnation for every block on the chip i 

BEGZNC 
racpatcutoff » 1.2 
pattoggle ^ *yes' 
basatOla'T' 

base[21»-C" 
basaC3)"*A" 
xutmetO,0] « •WZ-563* 
hex(0,0] B -CIAGCC- 
20 na»e[l,0] a -wi-SST- 

hex[1.0I s *tCAGAG* 
aame(2,01 • -Wl-sg?- 
hex [2.01 » 'TCGATA' 

hexC3,0J = -AACTAA' 
naaie(4,01 = •WI-901* 
hexl4,0j - -CTTGAG- 
xiome£5,0] « "WX-802- 
hax(5,01 = •CATCCT- 
xiame(6,0] = -WI-IOSS* 
30 hexCe.Oj B -CAGATA' 

oasie(7,o] = •wi-1147- 
hexC7,0] « -ACGAGC- 
naiiie(8«0] » -WI-132S- 
h«x(8,0] • •CtCEAC* 
na»eC9,01 « •WI-1417- 
hexf9,0I a -GTCTTT" 
nanelO,!) = "WI-1796* 
hex[0,l] ^ -AAAGIG" 
nanie[l,ll = •WT-ia25- 
hex(l,l) a -GTCTTC* 
xiajneC2,lJ a •WI-1879" 
hex[2,l] « •TACTGT" 
na»e(3,ll » -va-1888- 
hexC3.1] s -ATGACA* 
XM«net4/l) B -WI-1912- 
haxl4,X] » -TTOTT* 
45 nameCS,!] - -WI-ISSS- 

hex(Sa) e -7CTCG0" 
iiama(6,l] » •WI-1741- 
hex[6A] « "GAAG6C* 
riaine(7,13 o •Wl-1760- 
hexC7,ll = -ACCACA* 
iiaiTie(8,ll = •WI-1799- 
bex(8,l) « •TCGATA* 
naine(9,l) = •WI-1973- 
hex(9.1] s -CAAGAG* 
name (0,2] a "WI-1980* 
55 hex(0',2) a 'AACTCA- 

naxne[l,2] o •wi-2015* 
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hex[l,2I «=• • 


GACTGT- 


naaeC2»2] ° 


•WI-2664» 


hex 12 r 21 = ' 


GGAGAG' 


namef^ .21 3 


•WI-4013- 


hex(3,2) =» • 


CTW?rG' 




•WI-7567- 




XAfiTGX* 




•WI-1159S- 




TAGAGC* 




•CM4.16" 


Hex 1 0 , A J 


»GATAAT' 


name(7,2] ^ 


-WI-6704* 


bex(7,2I = ' 


•ACTCCA" 


iiaiiie(8,2] = 


•Wl-6731' 


hexC8.2] a < 


•GGCACA* 


aajne[9 / 21 = 


•WX-6787* 


bexC9*21 3 ' 


•ACAGTT* 


{land C 0 ' 3 ) " 


-WI-6910- 


hAx[0,3] a ' 


■tAGTTG* 




•Wl-9518- 


bexCX«3] " 






■ADH3' 


Kmc 12.31 = 


•ATAGTT" 




"ACT* 




•GACTGG" 


nAlHArd .31 = 


•AU»B-1" 




•TTCTGG" 




•AtDOB-2* 




•CCAGAT* 


MJUBA f 3 1 a 


•AFOB* 


hext6t3] ° 


•ACtCCT" 


name f 7. 31 — 


•AFOE(152T/C)* 


bexC7,31 « 


•TCTCGC 


najtte[8,31 = 


■AJOE<290T/C)* 


hex(8,3l « 


•AGTOGC* 


name(9.3) =» 


•ARSB* 


hax[9,3l » 


•TCGATG- 


naaelO,4) 


•ATla* 


bex(0,4l = 


•CTTCCC* 




. -ATlb- 


hex(1.4] » 


•GCACXT* 


namel2,41 « 'BCU" 


bex(2.4l « 


-ACGAjGG- 


Baaet3,41 = -BRCAla- 


hext3.4l a 


•CATCTO- 


naniel4,41 » •HRCXlb' 


bext4r4l *• 


•AGAOAG* 


xiaine[5^41 " 


3 -BRCAIC 


bext5.4j « 


•GAAGAC* 


ixaiaol6,41 - 


= -03 S2* 


baxl6,41 = 


•CCAGGT* 


nanie(7,4] « -D3S11- 


bexC7.4] » 


•TCTGRR* 


Da]iie(8,4] » "D3S12- 


hexl8,4l = 


*CCAGGG' 


naineC9,4I = 'DRDa- 


hex[9i4J « 


•CACTGG" 




B •FABP2- 


bexl0,5l « 


•GCGACT' 
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naffieCl,5] =» "GCK- 
hexCl,5] = -GAGACA* 
name[2,51 » •HT2- 
hexC2.5) « -CTGTGG- 
naxna(3,5] « •HT4- 
hexI3,5I » •TGCAAT- 
nane[4,S] = -HTS- 
hexC4,5) » •ACTCGA* 
najDQ(S«5] = •IGF2" 
hexC5,51 ^ •GGGACC" 
najneC6,51 « •IGHV4-6 
hexC6,5l a -TCTCGA- 
tiameC7/5] • "INS- 
hexI7,5I » -TCtACC- 
aam9(8,5] » *LDLR* 
hex(8.5] « 'GGCTAA* 
nain8[9,SI = "1^79* 
hex[9,5] = •CCAGGG' 
name CO, 6] a "LFL* 
hexC0,61 = •ACCTAG* 
named. SI » "MCC* 
hdX(l,6) = *6CCT6A- 
name[2,61 = •METK' 
b8XC2,61 a •CCCTCG* 
oaffiet3.61 » -HRAMF* 
hexC3,6] « •CAGATG* 
nameC4,6] = •PAR* 
hex [4, 6] » •ACATTG" 
namG(5,61 » -Per/RDS* 
hex(5,61 s -GAAGGA* 
nameCe^S] = -PPPSRl- 
bexC6«$) = 'OACTAA* 
aaBid[7,$l a -RDS* 
hex[7,5] s "AGGACG* 
name(8.61 ^ -sl4544* 
hext8,6] = •TCTGCT* 
naBie(9,61 = 'SXaOA* 
hex(9,61 « "GGCATG* 
iiaiaeC0,71 » "TcR-CAl* 
hexC0,71 a -TGCGGT* 
iiaineC1.71 = •TCR-CB22 
hex(l,7J - -QGCTGG" 
naffleC2«7] » •TCR-CB23 
hexC2,7J = •CTCTAC* 
xia»eC3,7l » •TCR-C824- 
hexf3,71 « •GTGAT6- 
naiaeC4,7] = •TcR-CB2S» 
hexC4,7I = •GTAfiCC* 
name(5,7J « •TCR-CB27" 
hexC5,71 a •ACCTTA- 
namaiS,!} = -V812a- 
hex(6,71 o -ACAGrG- 
name [7, 71 s -VBUb- 
hex(7,7J s -CACTCA- 
bkgsTjn s 0 
bkgoum « 0 
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{ 

readthls s X 

it <$1 - /(A-Za-z)/ H S2 - /(A-2a-«l/) readthis « 0 
if (readthis 1) rawdata($l, $2J = $3 

if ($1>2 && $2>4) if && $2<l2d> if ($1<90 || $2<109) 

{ 

px « ittt({$l-3)/ll> 
py = iait(($2-5)/15) 
pxo « (Xl*px)>3 

mx = $X«pgco 
by = $2-pyo 

block a 3«(iat(by/5))+7 
if (by%5 |b 4 £6 aoc i» 10) 

rs < 

sb s ]»aseCby%5) 

sis C]px,py, block, sb»iax] = $3 

> 

if (by%5 4 1 1 mx o«> lO) 

( 

^0 bJcgsum $3 

bfcgxxuffl-f+ 

) 

) 

) 

Pi; 

printf ( •background = %5.2f\n', bkgsum/bkgaua) 
priatf •MARK£R\tBSTBLK\tSATI0\C\tDB\CC3iECK\t\WATRAT\n- 
for (py«=0;py<8;py++) for (px==0;px<10;px*-.-) if (py < 7 1 1 px < 8) 
( 

mCOl B sub8tr(bextpx,py) ,1/1) 
30 mCXl « substr(hexCpx,pyl,i»l) 

m(2] B siib8tr(be3cCpx,py]«2«l) 
9(3] s subscr<bexCpx,py].2.1) 
miA} K substr (beat {px.pyl •3,2) 
»I5) a stjb8tr(be3c(px.pylr3,2) 
m(6) « s\2bstr(bexlpx,pyl,5.1) 
iot7) a s^str(hQXlp3t,pyl.5,l> 
in[8) a sxabstr{bexlpx.pyl . 6,1) 
mt9) = substr(hexCpx.pyl,6,l) 

center » sabscr(hex[px,py] ,3 ,1) Vsubstr (hexlpx,pyl .4,1) 
peatancr = aio J • •m(2J •( "center"] 'mlSl •'miaj 
40 header = ' f-px-*-!', 'py*!* J * naiae(px,pyl "\n" peataner "Nn- 

headprint a 0 
{ 

for (j=0;j<«2 ;}♦■»•> 
( 

block » (3*j>+7 
nuRi2 e 0 
dQn2 ts 0 
QUinl « 0 
denl « 0 
x2 0 

50 nl B 0 

n2 «i 0 

for <fsf0rf<5?f**) 
{ 

naxhi[px«py,block,f] » 0 

for <0»O?g<4;g+*) maxlo Cpx.py, block, g, f) « 0 

55 

} 
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10 



20 



for (Jc«0;lc<=9;k+*) for (b=0;b<«3;b*+) 
( 

z « i»t(k/2) 

signal « slg(px, py« block. base (b} *k] 
oniiC 3s 0 

If (mCkl - base(bl) omit « 1 
if (omit » X) 
{ 

q « maxhjL(px,py,blocX,zI 

if (signal > q) maxhi(px.py,bXock, «signal 
) 

if (omit 0) 
( 

q s »axloCpx»py«block.b,z} 
15 it (signal > q) inaxlo(pix,py,block«b,z]ssignal 

if (k%2 « 0) 
{ 

num2 signal 
x2 (signal) '^2 

} 

if (k%2 « 1) 
C 

ddii2 signal 
x2 (signal) '^2 
25 n2-M. 

) 

) 

if (omit « 1) if (k»«4 II k«5) 
C 

30 if (ba^eCb] «3 substr(hex(px,pyK3,X} ) 

( 

AumX "f-s signal 

> 

if (baseCb] as sttbstr(hex(px,pyj ,4,1) ) 
( 

35 denl -fs signal 

) 

} 

> 

maxhisum = 0 
40 for (feO;f<S?fi-4') 

( 

naxhisvm ittaxhiCpx,py,bloc)c, f ) 

) 

naxhiav s nasehisuoi/S 
naxlosxna s o 

for (gB0;g<57g-»»f) for (v=s0;v<4;v++) 
( 

maxlosun maxlo Cpx,py, block, v, g] 

} 

maxloav s iQaxlo5\]n/14 
50 maxrat s maxhiav/maxloav 

num B ( (nuiia/2)-(ntixii2/nl) ) 
if (nua < 0) niaa = 0 
dens ((denl/2}-(den2/n2)) 
if (den <» 0) den ^ 0.001 
ratio B nuxn/den 
numl/2 



55 
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10 



if {denl/2 > max) oax ^ denI/2 
n s nl+n2 

stdvxnum = ( Ca*x2) - (num2+dea2)^2) 
if (stdvxnum < 0) scdvx « 0 
scdvx - (stdvxxaim/(n'*2n^(0.S) 
If (maxrat > racpatcutoff || pattoggle ^ *no-) 
( 

If (headprint == 0) 
{ 

princf header 
beadpriAt » 1 

> 

priatf •\t20/-block*\c* 
printf (•%1.2f\f, ratio) 
if (ratio < 10000) printf *\f 
rat a ratio 

if (ratio 0) rap s .00001 
20 lograt « log(rat) /lo9(10) 

priatf (•%2.2f\f, 10*lograt) 
printf (•%2.2f-, max/stdvx) 
if (max/stdv5c < 2) princf •\tFAIL\f 
if (max/stdvx >= 2) printf "VtVf 
printf (•%2.2f, maxrat) 

if (maxrat > ratpatcutof f } printf "\t*GOODPAT* 
printf -Nn- 

) 



25 



30 



35 

Claims 

1 . A method of identifying whether a target nucleic acid sequence includes a polymorphic variant comprising: 

40 hybridising said target nucleic acid sequence to an array of oligonucleotide probes, said array comprising at 

least one detection block of probes said detection block including first and second groups of probes that are 
complementary to said target nucleic acid sequence having first and second variants of said polymorphism, 
respectively, and further comprising third and fourth groups of probes, said third and fourth groups of probes 
having sequences identical to said first and second groups of probes, respectively, except that said third and 

45 fourth groups of probes include all possible monosubstitutions of positions in said sequence that are within n 

bases of a base in said sequence that is complementary to said polymorphism, wherein n is from 1 to 5; 
determining hybridisation intensities of probes in the group; 

calculating a ratio PM(x) -MM(x) /PM(y)-MM(y), wherein PM(x) is the average hybridisation intensity of probes 
that are perfectly complementary to the first variant of the polymorphism, MM(x) is the average hybridisation 
50 intensity of probes that are complementary to the first variant except for a single mismatch, PM(y) is the average 

hybridisation intensity of probes that are perfectly complementary to the second variant of the polymorphism, 
MM(y) is the average hybridisation intensity of probes that are complementary to the second variant except 
for a single mismatch; and 

55 characterising the polymorphism as homozygous for the first variant, homozygous for the second variant or 

heterozygous for the first and second variants from the ratio. 

2. A method according to claim 1, further comprising determining whether PM(x) or PM(y) is greater that twice the 
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standard deviation of the intensities of the probes that are complementary to the first variant except for a single 
mismatch or the probes that are complementary to the second variant except for a single mismatch respectively 

3. The method of claim 1 or claim 2, wherein n Is 2. 

5 

4. The method of any one of claims 1 to 3, wherein said first and second groups of probes comprise a plurality of 
different probes that are complementary to overlapping portions of said target nucleic acid sequence. 

5. The method of any preceding claim wherein said monosubstltutions occur at a plurality of distances from a 3' end 
10 of said probes, 

6. The method of any preceding claim wherein said detection block includes between 8 and 88 different probes. 

7. The method of any preceding claim comprising between 1 and 1 ,000 different detection blocks, each of said de- 
15 tection blocks including probes complementary to first and second variants of a different polymorphism In said 

target nucleic acid sequence. 

8. The method of any preceding claim wherein each of said detection blocks further comprises fifth and sixth groups 
of probes, said fifth and sixth groups of probes being complementary to said first and second groups of probes. 

20 respectively. 

9. The method of any preceding claim wherein said target nucleic acid comprises a detectable label. 

10. The method of claim 9, wherein said detectable label Is a fluorescent group. 

25 

11. The method of claim 9, wherein said label is a binding group. 

12. The method of claim 11, wherein said binding group is selected from biotin, avtdin and streptavidin. 

30 13. The method of any one of claims 1 to 7, wherein said detection block includes fifth and sixth groups of probes, 
said fifth and sixth groups of probes being complementary to first and second variants of an antlsense strand of 
said target sequence. 

14. The method of any preceding claim, wherein said step of determining comprises: 

35 

calculating a ratio of hybridization intensity of said target nucleic acid to said first group of probes versus 
hybridization intensity of said target nucleic acid to said second group of probes; 

and identifying a homozygote for said first variant when said ratio is greater than 2, a homozygote for said 
second variant when said ratio is less than 0.5, and a heterozygote when said ratio is between atx)ut 0.7 and 
40 1.5. 



Patentanspruche 

45 1. Verfahren zum Erkennen, ob die Sequenz einer Targetnukleinsaure eine polymorphe Variante enthalt, umfassend: 

das Hybridisieren der Sequenz der Targetnukleinsaure auf eine Anordnung von Oligonukleotidproben, wobei 
die Anordnung mindestens ein Probennachweisfeld umfasst, wobei das Nachweisfeld mindestens eine erste 
und eine zweite Gruppe von Proben enthSIt, die zur Sequenz der Targetnukleinsaure, welche eine erste bzw. 

50 eine zweite Variante des Polymorphismus enthalt, komplementdr sind sowie zudem eine dritte und eine vierte 

Gruppe von Proben, deren Sequenzen identisch sind zu denen der Proben aus der ersten bzw. der zweiten 
Gruppe, nur dass die dritte und die vierte Gruppe von Proben Monosubstitutionen an alien moglichen Stellen 
in der Sequenz enthalten . die innerhalb von n Basen der einen Base von der Sequenz liegen, die komplementar 
zum Polymorphismus ist, wobei n zwischen 1 und 5 ist; 

55 Bestimmen der Hybridislerungsstarken von den Proben in der Gruppe; 

Berechnen des Verhaltnisses 
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PM(x)-MM(x) / PM(y)-MM(y) 



in dem 



PM(x) die durchschnittiiche Hybridisierungsstarke von Proben ist, die perfekt komptementar sind zur ersten 

Variante des Polynnorphismus, 
MM(x) die durchschnittiiche Hybridisierungsstarke von Proben, die bis auf einen Mismatch zur ersten Variante 

komplementar sind, 

10 PM(y) die durchschnittiiche Hybridisierungsstarke von Proben, die zur zweiten Variante des Polymorphismus 

perfekt komplementar sind, 

MM(y) die durchschnittiiche Hybridisierungsstarke von Proben, die bis auf einen Mismatch zur zweiten Variante 
komplementar sind, und 

15 Charakterisieren des Polymorphismus gemaB dem Verhaltnis als homozygot zur ersten Variante, als homo- 

zygot zur zweiten Variante Oder als heterozygot zur ersten und zur zweiten Variante. 

2. Verfahren nach Anspruch 1 , zudem umfassend die Bestimmung. ob PM(x) oder PM(y) groBer ist als das Zweifache 
der Standardabweichung von den Probenstarken, die bis auf einen Mismatch zur ersten Variante komplementar 

20 sind bzw. die bis auf einen Mismatch zur zweiten Variante komplementar sind. 

3. Verfahren nach Anspruch 1 Oder 2, wobei n gleich 2 ist. 

4. Verfahren nach irgendeinem der Anspruche 1 bis 3, wobei die erste und die zweite Probengruppe eine Anzahl 
25 verschiedener Proben umfasst. welche zu uberlappenden Bereichen der Sequenz der Targetnukleinsaure kom- 
plementar sind. 

5. Verfahren nach irgendeinem vorhergehenden Anspruch, wobei die Monosubstitutlonen in einer Anzahl von Ab- 
standen vom 3'-Ende der Proben entfemt liegen. 

30 

6. Verfahren nach irgendeinem vorhergehenden Anspruch, wobei die Nachweisfelder zwischen 8 und 88 verschie- 
dene Proben enthalten. 

7. Verfahren nach irgendeinem vorhergehenden Anspruch, umfassend zwischen 1 und 1000 verschiedene Nach- 
35 weisfelder, wobei die Nachweisfelder jeweils Proben aufweisen, die zur ersten und zur zweiten Variante des ver- 

schiedenen Polymorphismus in der Sequenz der Nukleinsaure komplementar sind. 

8. Verfahren nach irgendeinem vorhergehenden Anspruch, wobei die Nachweisfelder zudem funfte und sechste Pro- 
bengruppen aufweisen, wobei die funfte und die sechste Gruppe von Proben zur ersten bzw. zur zweiten Proben- 

40 gruppe komplementar sind. 

9. Verfahren nach irgendeinem vorhergehenden Anspruch, wobei die Targetnukleinsaure einen Nachweismarker 
aufweist. 

45 10. Verfahren nach Anspruch 9, wobei der Nachweismarker eine fluoreszierende Gruppe ist. 

11. Verfahren nach Anspruch 9, wobei der Marker eine Bindungsgruppe ist. 

12. Verfahren nach Anspruch 11. wobei die Bindungsgruppe ausgewahit ist aus Biotin. Avidin und Streptavidin. 

so 

13. Verfahren nach irgendeinen der Anspruche 1 bis 7, wobei das Nachweisfeld eine funfte und eine sechste Proben- 
gruppe umfasst, wobei die funfte und die sechste Probengruppe zur ersten und zur zweiten Variante des Anttsense- 
Strangs der Targetsequenz komplementar sind. 

55 14. Verfahren nach irgendeinem vorhergehenden Anspruch, wobei der Bestimmungsschritt umfasst: 

Berechnen des Verhaltnisses aus der Hybridisierungsstarke der Targetnukleinsaure zur ersten Probengruppe 
und der Hybridisierungsstarke der Targetnukleinsaure zur zweiten Probengruppe, und 
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Identifizieren einer Homozygote zur ersten Variante, ist das Verhaltnis groBer als 2, einer Homozygote zur 
zweiten Variante, ist das Verhaltnis weniger als 0,5 und einer Heterozygote, liegt das Verhaltnis zwischen etwa 
0,7 und 1 ,5. 

5 

Revendications 

1. Precede pour identifier si une sequence d'acide nuctelque clble comporte un variant polymorphe, comportant les 
etapes consistent a : 

w 

hybrider ladite sequence d'acide nucleique cible a une serie de sondes ollgonucleotidiques, ladite serie com- 
portant au moins un bloc de detection de sondes, ledit bloc de detection comportant des premier et deuxi^me 
groupes de sondes qui sont complementaires de ladite sequence d'acide nucleique cible ayant des premier 
et second variants dudit polymorphisme, respectivement, et comportant de plus des troisieme et quatrieme 

15 groupes de sondes, lesdits troisieme et quatrieme groupes de sondes ayant des sequences identiques auxdits 

premier et deuxieme groupes de sondes, respectivement, a I'exception que lesdits troisieme et quatrieme 
groupes de sondes comportent toutes les monosubstitutions possibles de positions dans ladite sequence qui 
sont dans n bases d'une base de ladite sequence qui est compl6mentaire dudit polymorphisme, ou n est 
compris entre 1 et 5, 

20 determiner des intensit^s d'hybridation de sondes dans le groupe, 

calculer un rapport PM(x)-MM(x) / PM(y)-MM(y), ou PM(x) est I'intensite d'hybridation moyenne de sondes 
qui sont parfaitement complementaires du premier variant du polymorphisme, MM(x) est I'intensite d'hybrida- 
tion moyenne de sondes qui sont compl6mentaires du premier variant a I'exception d'une difference unique, 
PM(y) est I'intensite d'hybridation moyenne de sondes qui sont parfaitement compl6mentaires du second va- 

25 riant du polymorphisme. MM(y) est I'intensite d'hybridation moyenne de sondes qui sont complementaires du 

second variant a I'exception d'une difference unique, et 

caracteriser le polymorphisme en tant qu'homozygote pour le premier variant, homozygote pour le second 
variant ou heterozygote pour les premier et second variants d'apres le rapport. 

30 

2. Precede selon la revendication 1 , comportant de plus la determination du fait que PM(x) ou PM(y) est superieur 
a deux fois I'ecart type des intensites des sondes qui sont complementaires du premier variant a I'exception d'une 
difference unique ou des sondes qui sont complementaires du second variant k i'exception d'une difference unique, 
respectivement. 

35 

3. Procede selon la revendication 1 ou 2, dans lequel n est egal k 2. 

4. Precede seton I'une quelconque des revendications 1 a 3, dans lequel lesdits premier et deuxieme groupes de 
sondes comportent une pluralite de sondes differentes qui sent complementaires de parties chevauchantes de 

40 ladite sequence d'acide nucieique cible. 

5. Precede selon I'une quelconque des revendications precedentes, dans lequel lesdites monosubstitutions appa- 
raissent k une pluralite de distances par rapport a une extremlte 3' desdites sondes. 

45 6. Precede selon I'une quelconque des revendications precedentes, dans leque) ledit bloc de detection comporte 
entre 8 et 88 sondes differentes. 

7. Procede selon I'une quelconque des revendications precedentes, comportant entre 1 et 1000 blocs de detection 
differents, chacun desdits blocs de detection comportant des sondes complementaires des premier et second 

50 variants d'un polymorphisme different dans ladite sequence d'acide nucieique cible. 

8. Procede selon I'une quelconque des revendications precedentes, dans lequel chacun desdits blocs de detection 
comporte de plus des cinqui§me et sixifeme groupes de sondes, lesdits cinquieme et sixieme groupes de sondes 
etant complementaires desdits premier et deuxieme groupes de sondes, respectivement. 

55 

9. Precede selon I'une quelconque des revendications precedentes. dans lequel ledit acide nucieique cible comporte 
un marqueur detectable. 
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10. Proc6d6 selon la revendication 9, dans lequel ledit marqueur detectable est un groupe fluorescent. 

11. Proced^ selon la revendication 9, dans lequel ledit marqueur est un groupe de liaison. 

1 2. Precede selon la revendication 1 1 , dans lequel ledit groupe de liaison est selectionne parmi la biotine, I'avidine et 
la streptavidine. 

13. Precede selon Tune quelconque des revendications 1 a 7, dans lequel ledit bloc de detection comporte des cin- 
quieme et sixi^me groupes de sondes, lesdits cinquieme et sixieme groupes de sondes etant complementaires 
des premier et second variants d'un brin anti-sens de ladite sequence cible. 

14. Precede selon Tune quelconque des revendications pr^cedentes, dans lequel ladite ^tape de determination com- 
porte les Stapes conslstant k : 

calculer un rapport d'intensite d'hybridation dudit acide nucleique cible audit premier groupe de sondes par 
rapport a une intensite d'hybridation dudit acide nucleique cible audit deuxieme groupe de sondes, 
et identifier un homozygote pour ledit premier variant lorsque ledit rapport est superieur k 2, un homozygote 
pour ledit second variant lorsque ledit rapport est inf^rieur k 0,5, et un h6t6rozygote lorsque ledit rapport est 
compris entre environ 0,7 et 1 ,5. 
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SEQUENCE AT POLYMORPHISM Wl'567 



TGCTGCCTTGGTTC[q]AGCCCTCATCTCTTT [$Bq ID WO;i) 



THREE BLOCKS OF COMPLEMENTARY 
20-MER OLIQOS WITH SUBSTITUTIONS (N) 
7, 10 AND 1 3 BP FROM THE 3' END 



P-2P-1 P P+lP+2 
AG AGAG AG AG 



B 



BASES IN THE SHADED COLUMNS 
3 • BLOCK 20/7 5 



o BLOCK 20/7 ^ , 

AACCAANICITCGGGAGTAGAG iO NO: 3^ 



3* BLOCK 20/10 5' . 
CGGAACCAAN[C]TCGGGAGTA ^^6Q lO s 4.) 



CGACGGAACCAANtClTCGGGA «D UOiS) 



BLOCK 20/1 3 



FIG. 2B 
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SEQUENCE AT POLYMORPHISM WI.567 
5 ' 3 * \ 

tgctgccttggttc[q]agccctcatctcttt (seq lo^^05l) 



SYNTHESIZE BLOCK OF COMPLEMENTARY 
20-MER OLIGOS WITH SUBSTITUTIONS {H) 
10 BP FROM THE 3' END 



3' 5' 
-ACGGAACCANG[T]TCGGGAGT ($Et? ^t> NO* 6) 
-ACGGAACCAKGCCITCGGGAGT (SEq, 10 NO:^) 
~CGGAACCAAN[T]TCGGGAGTA ^iEd »0 y$0''9) 

— CGGAACCAAN(C1TCGGGAGTA (SeCi 10 r#0»-ij.^ 

— GGAACCAAGCN3TCGGGAGTAG(S€Q NOJ*^) 
GGAACCAAG [ N ] TCGGGAGTAG 

GAACCAAGCT]NCGGGAGTAGA (S£q 10 NOMo) 

GAACCAAG[C]NCGGGAGTAGA (S£Q 10 N<J«lO 

AACCAAG[T]TNGGGAGTAGAG (S£q )0 NOMZ) 

AACCAAG[C]TNGGGAGTAGAG fSfiQ 10 



AGAGAGAGAG = POLYMORPHISM 



T 
G 
C 
A 



PREDICTED PATTERNS 



GAGAGAG AGAGAGAGAG AGAGAGAGAG 



A/A 



G/G 



A/G 



F/a3 
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